1 research outputs found

    Provenance-Aware CXXR

    Get PDF
    A provenance-aware computer system is one that records information about the operations it performs on data to enable it to provide an account of the process that led to a particular item of data. These systems allow users to ask questions of data, such as “What was the sequence of steps involved in its creation?”, “What other items of data were used to create it?”, or “What items of data used it during their creation?”. This work will present a study of how, and the extent to which the CXXR statistical programming software can be made aware of the provenance of the data on which it operates. CXXR is a variant of the R programming language and environment, which is an open source implementation of S. Interestingly S is notable for becoming an early pioneer of provenance-aware computing in 1988. Examples of adapting software such as CXXR for provenance-awareness are few and far between, and the idiosyncrasies of an interpreter such as CXXR—moreover the R language itself—present interesting challenges to provenance-awareness: such as receiving input from a variety of sources and complex evaluation mechanisms. Herein presented are designs for capturing and querying provenance information in such an environment, along with serialisation facilities to preserve data together with its provenance so that they may be distributed and/or subsequently restored to a CXXR session. Also presented is a method for enabling this serialised provenance information to be interoperable with other provenance-aware software. This work also looks at the movement towards making research reproducible, and considers that provenance-aware systems, and provenance-aware CXXR in particular, are well positioned to further the goal of making computational research reproducible
    corecore